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SPHERICAL HARMONIC ANALYSIS AND SOME 

APPLICATIONS TO SURROUND SOUND 

P.S. GaskelL M.A. 



1. Introduction 



The two and three dimensional characteristics 
of the apparent disposition and resolution of sound 
sources in a sound field, as observed at a point in 
space, may be described by azimuthal and spherical 
harmonics. Spherical harmonic analysis furnishes a 
powerful method for studying surround sound and 
the problems of transmitting and reproducing it. 
A number of papers demonstrating this have 
appeared in the literature^ -^-^ . This Report gives 
a basic description of this analysis and discusses a 
number of applications relevant to surround sound 
transmission. 

For largely historical reasons, symbols des- 
cribing parameters in this subject have appeared in 
the past with a notable lack of consistency and this 
has led to some confusion. An attempt is made in 
this Report to resolve some of the discrepancies 
although a standard of terminology is unlikely to 
be achieved easily. 

The problems of transmitting and reproducing 
surround sound are very intricate and are due largely 
to the sophistication of the human hearing system. 
It should be remembered then that when designing 
particular transmission systems, theoretical analysis 
must be complemented by a measure of pragmatism 
in order to arrive at a practical solution. 



2. Surround sound transmission 

2.1 Azimuthal harmonic analysis 

At any point in a sound field, sounds arrive from 
an infinity of directions. The net effect may be 
thought of as the sum of sounds emanating from a 
large number of point sources. If we consider just 
one point source of unit magnitude at azimuth 
0, in the horizontal plane, its azimuthal spatial 
distribution appears to the observer as a delta 
function that repeats itself every 2-n radians. In a 
similar way that repetitive time signals may be re- 
presented by a Fourier series expansion, the azimuthal 
distribution g{6) of a group of sources may be 
expressed by 



m 



n=l 



(a^ cos nd + b„ sin nd). 



For a point source of unit magnitude 

g(d) = 8(6- di) 

Evaluating the coefficients a„ and b„ for a 
point source in the same way as for a Fourier 
series expansion gives 



g(B) 



1 1\~^ 

~ + ~" / (cos ndj cos 

n=l 



iTT 



+ sin nd; sin nd) 



Summing for all f , the complete ensemble of point 
sources is retrieved. Terms in cos nd and sin nd 
are said to be the azimuthal harmonics of order n. 

Schemes for the transmission of surround 
sound in the horizontal plane, based on this form 
of harmonic analysis, have been described in earlier 
papers'"'^. These have shown that, for the trans- 
mission of surround sound of order n in the 
horizontal plane, (2k + 1) channels are needed. In 
this Report, mainly first order transmission will be 
considered. Thus for a point source, the first (and 
zero) order terms are given by: 

1 1 
g(d) = - + -(cos 0,- cos 6 + sin d,- sin 6) 
2n TT ' 



-\V2 + cos(d -6i)^ 



The polar diagram of g(d) is then that of a 120° 
hypercardioid directed towards the source azimuth 
d = di (see Fig. 1). Three transmission channels 
are needed, one carrying signals with an omni- 
directional characteristic, and the other two carry- 
ing signals with figure-of-eight characteristics.* 
The magnitude of the zero order component is 
l/27r and the magnitudes of the two first order 
components, cos and sine are(l/7r)cos0/and(l/7r)sin 0/ 
respectively. For the general case of an ensemble 
of sources with azimuthal distribution g{d), the 
zero order component magnitude is 

* Note that, when describing the polar characteristics of micro- 
phones or sound fields, the common practice of taking the 
modulus of the azimuthal variation is followed here. Thus r = cos d 
or sin 6 each has a figure-of-eight characteristic; if the modulus 
is not taken, these would each represent only a single circle. 



(PH-201) 



2-rt 



1 
7m 



g(9) dd 



and the magnitudes of the cos d and sin d 
components are respectively 



27r 



g(0) cos B dd 



2ir 



and 



g(9) sin 6 dd 



The first order, horizontal, azimuthal com- 
ponent magnitudes at a point in a sound field may 
be sampled by means of first order microphones. 
These are widely used at present by recording and 
broadcasting companies and they have a general 
azimuthal response to a point source at azimuth 6^ 
of the form 



a^ + a cos dj + h sin 6^ 



or, if its maximum is directed towards 0=0, 



a„ + a. cos 6 



Responses of first order microphones with various 
ratios of a^ /a are shown in Fig. 1 . Clusters of 
three or more microphones of this type can resolve 
the zero and first order components of the sound 
field at a point by the use of a suitable matrix 
(see Sections 3 and 5.2). Electronically panned 
signals or highly directional microphones, however, 
may contain higher order components of the form 
a„ cos ndj and b„ sin ndf (n > 1) (see Section 4). 

In three dimensions, the azimuthal harmonics 
become spherical harmonics and the expansion takes 
the general form 



n=0 




cardioid 






135° hypercardioid 



/F 




120° hypercardioid 
[scale lineaFj 



figure of eight 



Fig. 1 - Cardioid-type characteristics 



where 



Y„ (6, 0) = a„, P, (cos 0) + 2 ^ [a„, 



/ . 



cos md 



+ h„^ sinmd] P„^(cos0) 
0, are the standard polar co-ordinates 
and ¥„"' are the associated Legendre polynomials. 

2.2 OriginaS, programme and reproduction sound fields. 

■ Having resolved the azimuthal (or spherical) 
harmonics, their recorded component magnitudes 
are sent down transmission channels. Each channel 
is notionally labelled according to its azimuthal 
order, on the basis that the channel signal will be 
ideally decoded and reproduced according to its 
proper order. In the horizontal plane, "ideal" 
decoding implies that the "zero order, omni- 
directional" channel signal. 



27r 



1 
27r 



g(9) dd 



is reproduced as a true omni-directional component 
in the reproduced sound field, the "cos 6" channel 
signal, 27r 



U 



g(d) cos d dd 
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is correctly reproduced as a forward-facing figure- 
of-eight component and so on. The sum of these 
components may be called the "programme" sound 
field. 

In normal programme production, it frequent- 
ly occurs that the "programme" sound field differs 
significantly from the original sound field. In 
other words, it is unusual for the "programme" 
sound field to correspond to the sound field at any 
point in the space of the enclosure in which the 
recording was made. This results from the use of 
directional microphones, electronic panning, arti- 
ficial reverberation and any of the other options 
that may be used in the course of a recording 
session. This is done not only to synthesise artificial 
sounds, but also to achieve sound balance, clarity, 
perspective, etc. ^ and is standard practice for 
normal programme production. Thus the "pro- 
gramme" sound field is likely to differ significantly 
from the original sound field even for programmes 
that are intended to sound "natural and life-like". 

Unfortunately, the "programme" sound field 
remains an unrealised goal in so far as methods of 
achieving "ideal" decoding have yet to be dis- 
covered. Practical decoders, limited by the number 
of loudspeakers used and by the present incomplete 
understanding of the human hearing system, give 
"reproduction" sound fields that deviate from the 
ideal. This is discussed in Section 2.4. 

2.3 Transmission 

For transmission, the order of directional infor- 
mation that may be conveyed depends on the number 
of channels available^ '^'^. Coding of the signals 
often takes place, prior to transmission, to achieve 
compatibility (mono/stereo, etc.) and/or if it is in- 
tended to convey high order information with only 
a limited number of channels. This may be done 
by a modulation process or by introducing inter- 
channel phase. Coding using interchannel phase 
may cause complex crosstalk between the azimuthal 
components. Thus, when the signals are decoded 
and broken down into their component parts, the 
zero order term for example may contain traces of 
higher order components. Obviously, coding by 
multiplexing is capable of maintaining independent 
signal components, though at an obvious cost. 

2.4 Decoding and reproduction 

The ear/brain ensemble is capable of high 
resolution (typically 4° image width) and a very 
large number of loudspeakers and transmission 
channels may be needed to simulate the sound field 
exactly. Fortunately, however, the ear is inter- 



pretive. When presented with a sound field of 
low azimuthal order, evidence suggests that the 
ear interprets it as one of higher order depending to 
some extent on the source azimuth. Thus it 
perceives sound images that are sharper than those 
which the azimuthal harmonic content would 
dictate from a theoretical point of view. If a 
limited number of loudspeakers and transmission 
channels are used, the ear can be deceived into 
believing that a fair approxim.ation to the intended 
"programme" sound field has been reproduced. 
This, however, requires an intimate understanding 
of the hearing system and this is both highly 
sophisticated and complex. 

Over the past seventy years or more, a great 
deal of research on the functioning of the hearing 
system has been documented. It appears that the 
ear /brain ensemble uses many different mechanisms 
for judging sound quality and localising sounds. 
Not all of these mechanisms are clearly understood 
and we are still left with an incomplete under- 
standing of the overall system. However, by 
satisfying as many aural conditions as possible in 
the reproduced sound field, the number of con- 
flicting auditory cues that have to be resolved by 
thebrain,wheninterpretingthesounds, is minimised. 

Practical decoding systems using a limited 
number of loudspeakers, in a listening room with 
its own acoustic characteristics, are not able to 
reproduce the "programme" sound field exactly, 
even to a limited order. The perceived sound 
field is liable to be distorted, particularly if the 
listener turns his head or moves about the room. 
Each encoding format may be complemented, 
therefore, by a variety of different decoding systems, 
each with its own characteristic deviations from the 
ideal sound field. The choice of decoder depends 
to some extent on the particular requirements of 
the listener. 

In programme production, this leads to a 
problem. Whenever a programme is balanced, a 
particular decoding system is used to monitor the 
programme and idiosyncrasies of the decoding 
system (such as tonal quality or localisation) 
are in many cases compensated for by the pro- 
gramme producer at this stage. Replaying the 
final programme on the same decoder gives results 
similar to that intended by the producer, but 
other decoders give different results. This 

difficulty is particularly evident in the case of 2- 
channel quadraphonic encodings where either 
"linear" or "logic" decoders^ may be used. 

Gerzon^''''^ has designed linear surround 
sound decoders for 2-, 3- and 4-channel trans- 
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mission systems (see Appendix 2). For a trans- 
mission system designed to reproduce the azimuthal 
characteristics to first order, he shows that the 
loudspeaker feeds should only consist of first and 
zero order terms; the artificial generation and 
insertion of higher order terms (e.g. by pair-wise 
panning) only impairs reproduction. Decoders 
based on these criteria have given successful results. 
However, other transmission schemes (see Section 
4) can be devised whereby signals containing higher 
order components are transmitted and decoded to 
give reproduction of higher order components at 
selected positions in the sound stage (e.g. single 
loudspeaker excitation). Both approaches will be 
discussed in the following sections. 

Stereo and mono reproduction map the 
original sound stage to a limited arc of the re- 
production sound stage. As with surround sound, 
the best way of decoding the recorded signals for 
stereo reproduction is closely linked to the hearing 
system. This has been discussed in other papers 
and will only be mentioned briefly here (see 
Appendix 2). 

3. M, S and T components of a sound field 

Symbols widely used in discussions on sound 
transmission are M, S and T*. These may be 
defined as the magnitudes of the first order, 
horizontal, azimuthal components of a sound 
field at a point within it; M is the magnitude of 
the omni-directional (pressure) component and S 
and T are the magnitudes of two figure-of-eight 
(velocity) components, one directed towards centre- 
left and the other towards centre-front respectively. 
These symbols are also used to represent signals 
obtained from microphones whose magnitudes are 
proportional to the magnitudes of the M, S and T 
components of a sound field. Thus, as was seen 
in Section 2.1, the normalised values of M, S and T 
for a sound field generated by a point source at 
azimuth 6^ are given by 

1 

M = - 
2tt 



Given an ensemble of sources with azimuthal 
distribution g(d), 



M 



1 

lit 



g(e) de 



s = - 

IT 



g{d) sin 6 dd 



and 



T 



1 

IT 



g(d) COS d de 



For a single source, the ratio of M to the 
peak values of S and T, Sp^g^ and Tpggi^ is Vz. 
This ratio changes, however, with more than one 
source according to the source distribution g(d). 
It is often convenient therefore to introduce a 
constant gain factor into the M channel for the 
purposes of signal handling and transmission (see 
Section 7). 

The values of M, S and T may be derived 
from a minimum of three first order microphones 
suitably disposed. A convenient, but less efficient 
method is to use four coincident, orthogonal 
cardioid or hypercardioid microphones directed 
towards the four corners of a square, Lp , Rp , Lg 
and Rg , using conventional notation^ (L = left 
R = right, F = front, B = back). The use of four such 
cardioid microphones will now be discussed. 

The general response of a cardioid-type 
microphone, with its maximum directed towards 
= 0, to a point source at azimuth dj is 

V (d:) = A + cos 0,- 



S = — sin 9j 



T 



cos 6,- 



* Unfortunately, the symbols M and S are also widely used to 
describe the Sum and Difference signals of F.M. stereophonic 
transmissions where they have a different meaning to that in 
the present context. 



where 



A 



constant 



If 6^ is measured anti-clockwise from centre- 
front, the responses of four cardioid microphones, 
directed towards the comer of a square, are given 
by: 

Lp = A + cos (df - 45°) 



Rf, = A + cos (dj +45°) 
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and 



Lg = /I + cos (0;-- 135°) 
Ra = A + cos(0,- +135°) 



Signals corresponding to the values of M, S and T 
can be derived from these microphone responses 
according to 



M.yaXap+i?p+LB+i?B)= - 



5=1/2 (Lp-Rp +Lb-Rb) =^/2 sin d^ 
and 7=1/2 (Lp + Rp - Lq - Rq) =^ cos 6^ 

1 



where 



X = 



2^/2A 



If X is set to unity, the often quoted equations for 
M, S and T are given as follows 



M 



VziLp +i?F +Ls +Rb) 



S = 1/2(1. 



Rp +Lb 



and 



T = Vz (Lp + Rf 



U 



Rb) 



However, the correct ratio of iW to 5 and T is then 
no longer preserved, except in the special case of 

A = 1/2^ 

The ratio of ideal M, given when 



X = 



1 



2-s/2A 



to the value of M when X = 1 is shown in Table 1 
for various microphone types. The parameter 
quoted is the factor by which M is in error 
relative to S and T. The results hold for both the 
single source case and for an ensemble of sources. 



Thus, although all the cardioid-type microphones 
(except figure-of-eight) give the correct form of the 
M, S and T responses, the ratio of M to S ^^^ 
and Tpgji^ changes. It is necessary, therefore, 
to standardise to a particular format (i.e. the ratio 
of M to A^ideai) f°r transmission for the sake of 
decoding. Criteria of signal handling dictate the 
choice as will be discussed in Section 7. 

The symbols M, S and T are often used 
to describe parameters other than those just given. 
In the present context, therefore, some authors^ 
use a different notation as follows: 



W 



M 



X = r 
Y = s 



with the addition of a fourth term Z, corresponding 
to a figure-of-eight characteristic directed vertically 
upwards. These symbols reflect conveniently 
the directions of the right-handed set of x, y, 2 axes 
of cartesian geometry. 



4. Microphone mixing techniques 

4.1 Pair-wise panning 

One of the simplest forms of surround sound 
reproduction is effected by using four separate 
transmission channels to each of four loudspeakers 
in a square or rectangular array; this transmission 
system is often referred to as "discrete" quadra- 
phony. One widely used method of panning mono 
sound sources that may be readily implemented 
with this system consists of panning a mono 
source according to a sin/cos law between two 
adjacent channels — so called "pair-wise" panning. 
To gain further insight into its implications on 
surround sound transmission, it is profitable to treat 
the technique with azimuthal harmonic analysis. 
4.1.1 Decoding 

In a first order transmission scheme, one 



Table \ — M, S,T response for various cardioid-type microphones 



Microphone Type 


A 


Ratio of M to Afi^^gg, 


Cardioid 


1 


2V2 


135° hypercardioid 


1/V^ 


2 


120° hypercardioid 


1/2 


V2 


Figure-of-eight 
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method of decoding^ dictates that the signal 
feeds to four loudspeakers in a rectangle should be 



L\ 



R\ 



L\ 



Vi (kM + 



T 



+ -=- 



^/2 sin \J/ ^y2 cos i// 



) 



1/2 (kM - 



T 



-) 



x/2 sin \|/ \/2 cos i|/ 



1/2 (*M + 



T 



.) 



-^ sin ^ y/l cos i|/ 



i?' = 1/2 (/^M- 



T 



a/2 sin ■^ yjl cos 1// 



■) 



where the loudspeakers are at azimuths ± ^ and 
180° ± \£/ and ^ is a constant. 

If M, S and T are such that the ratio of Al to 
^ideai is V^. as has been suggested for some 
transmission systems, then k should assume values 
of 



1 or —(see reference 7). 

If ^ = 1 and the speakers are in a square with 
\jj - 45°, then 



and 



LV 


= y2(M + 5' + r) 


i?'. 


= ViiM-S + T) 


L'b 


= y2(M + S-T) 


/?' 


= ViiM-S-T) 



(1) 



(2) 



formulated as follows: 

Mp = 1/2 (Lp +i?F +Lb +i?B)^ 
Sp = y2aF-i?F +LB-i?B) 
Tp = 1/2 (Lp +'Rf -Lb -/?b) 

Qp = l/2(Lp-i?p-LB +^B)y 



where the Lp, R^, Lq and Rg signals are 
derived from a mono signal panned according to 
a sin/cos law between pairs of adjacent channels, 
so called pair-vwse panning. If the source is to be 
panned across a corner, switching of the signal 
pairs takes place. The azimuthal characteristics 
are similar to those of pairs of orthogonal figure-of- 
eight microphones under free-field conditions. 
If we take as an example the front pair of corners, 
where pair-wise panning is achieved by arranging 
that 



Lp = cos (Of 


-45°) 


Rp = cos (df 


+ 45°) 


Lb = 




i?R = 





we have, by substitution in Equations (2) , 



Mr, 



Tp 



— cos dj 

s/2 



1 

— sin Bi 

V2 



1 
— cos 

V2 



This is one of the simplest forms of decoding and 
forms the basis for the pair-wise panning scheme. 

4.1.2 Recording and transmission 

With four loudspeakers in a square, by modify- 
ing the decoding Equations (1), it is possible to send 
M, S and T like signals down the "M", "S" and "T" 
channels, together with a fourth signal Q in order 
to give single speaker excitation. This, of course, 
represents a high azimuthal order in the reproduced 
sound field and is not in keeping with the first 
order scheme. The four channel signals are 



1 

Qp = fl^^^^i 



The corresponding decoding (for a square layout) is 
L'p = V2{Mp +Sp + Tp + Qp) 
i?'p = ViiMp-Sp +Tp-Qp) 
L'b = Vi (Mp + Sp - Tp - Qp) 



L\ 



ViiMp-Sp-Tp + Qp) 



>- 



(3) 
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Substituting forMp, 5p, Tp and Qp giygg 
L'p = cos (d) - 45°) 
i?'F =cos(9^- +45°) 



L'b =0 



as before. Thus, if a source is panned to 
Lp, i.e. 0; = 45°, only the L'p speaker is excited. 
It is seen that this decoding is compatible with 
Equations (1). 




Fig. 2 - Pair-wise panning — M, S, T, Q. 



The manner in which theMp, Sp, Tp and Qp 
signals vary with 6f for all the pairs of corners is 
shown in Table 2 and Fig. 2. Sp and Tp have the 
correct sin 0,- and cos 6^ form respectively for all 
values of dj from to 360°, but the Mp signal 
only approximates to a constant and drops by 3 dB 
at the corners; the Qp signal has a X-shaped 
characteristic. The Mp and Qp signals suffer 
discontinuities at the corner positions and therefore 
contain an infinite series of high order azimuthal 
components. 

The series expansion of Qp may be shown to be 



-V(-^si 



mr 
sm + 



nir 



^ + z cos — ) sin 2nd,- 

2 2n + l 2 ' 



n=l 



= - (sin 2di sin 40,- 

TT ' 5 ' 



4.1.3 Discussion 



-sm 



9, +-sin8e,-+... ) 



Mo 






Lp and Rp 



Table 2 — Pair-wise panning for a square 
Pair-wise panning between corners 



Pair-wise panning and the associated encoding 
and decoding are an artificial means of achieving 
high order transmission and reproduction. Ad- 
vantage is taken of the idiosyncracies of the de- 
coding (due primarily to the limited number of 
loudspeakers) to obtain sound images of anomalously 
high resolution at selected positions in the re- 
produced sound field. To achieve this, special 
high order signals, similar to M, S and T, together 
with a fourth signal Q are transmitted down the 
"M", "S", "T" and "Q" channels. These signals 
(Mp, Sp, Tp and Qp) do not correspond to the 
azimuthal components of a real sound field. 

To realise this high degree of resolution, the 
encoding must correspond closely to the decoding 
format. The coding used for pair-wise panning 



layout 



cos dj 
sin 9y 
cos 6j 
sin dj 



Lg and Lp 



sin dj 
sin dj 
cos dj 
cos $1 



Rg and Lg 
—cos di 

sin 6; 

cos d^ 
—sin dj 



Rg and Rp 



—sin 6j 
sin 6j 
cos di 

—cos 6^ 
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differs therefore from correct M, S and T coding 
which is an entirely general scheme and is much less 
dependent on the decoding adopted. However, 
although pair-wise panning does not conform to 
the strict characteristics of correct M, 5 and T 
transmission, it does make best use of the limit- 
ations of the rectangular array of loudspeakers as 
a decoding method. In so doing, it provides an 
effective and practicable technique that is an 
important facility in the production repertoire. 

In view of the discontinuous nature of pair- 
wise panning signals, it is not easy to convert them 
to correct M, S and T signals. However, within the 
context of 2-channel, System HJ (13LP2) encoding, 
both types of signal may be fed to an HJ encoder 
with Lp, Rp, Lq and Rq inputs with satisfactory 
results. Pair-wise panning signals appear as four 
corner signals Lp, Rp, Lg and Rq and are fed 
directly to the encoder inputs. With M, S and T 
signals, where the ratio of M to Mjjjgg, is 2, corner 
signals corresponding to those from a cluster of 
four 135° hypercardioid microphones may be 
derived using Equations (1). These signals, or 
those from an actual 135° hypercardioid micro- 
phone cluster, may then also be fed directly to the 
HJ encoder. Two encode loci are given with the 
two sets of signals and both give satisfactory 
results. These are shown on the Scheiber sphere in 
Fig. 3 and they touch at the centre quadrant 
positions. The compatibility between the two 
types of source signal is an important and valuable 
feature of the 13LP2 encode option of System HJ. 



4.2 Coincident pairs of microphones 

Pairs of cardioid microphones, subtending angles 
typically of 90° or 180° are very frequently used 
in quad and stereo*. These normally give the 
same result as an orthogonal pair of figure-of-eight 
microphones, but with the addition of a constant 
omni-directional component to the M and T signals. 
Thus, an orthogonal pair of cardioid microphones, 
directed towards Cp , with the "left" microphone 
signal 

L = 1 + cos (di - 45°) 

fed to the Lp input and the "right" microphone 
signal 

R= 1 +cos(0,. +45°) 

fed to the Rp input, gives, using Equations (2), 

M = 1 + ;= cos dj 



V^ 



— sin d 



1 + — cos d,- 

V2 



and 



Q = 



n/2 



sin di 



150 



180 



-150 




90 a 



-120 



Fig. 3 - System HJ tolerance zones and 
option 13LP2 loci 



-90 



pan- pot 
hypercardioid 



> option 



13LP2 ioci. 
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As with pair-wise panning of mono sources, which 
is equivalent to pairs of figure-of-eight microphones, 
pairs of other cardioid-type microphones have 
azimuthai responses that are only approximations 
to correct M, S, T signals, i.e. of the form A, 
sin 9;, cos 6 1 respectively. Nevertheless, when 
decoded and reproduced as described by Equations 
(2) and (3), satisfactory results are given with this 
technique. 

In the course of a normal sound balance, the 
particular tonal and azimuthai characteristics of 
the various cardioid pairs are often exploited to 
achieve certain effects*. Tonal differences between 
the various cardioid pairs can be traced to the 
anomalous tonal characteristics and the non- 
coincidence of the microphones at high frequencies. 
The recently developed "Sound field" microphone^ ° 
goes some way towards overcoming these deficien- 
cies, as well as giving correct M, S and T signals. 
This will be described in Section 5.2. 



5. Spherical harmonic analysis 

5.1 First order characteristics in three dimensions 

Using polar co-ordinates (r, d, 0) (see Fig. 4), 
the figure-of-eight characteristics along the x-, y- 



and z-axes are respectively 




cos B sin 



sin Q sin 



and 



cos 



These form the first order azimuthai components 
in three dimensions, i.e. the first order spherical 
harmonics (the omni-directional component is the 
zero order term). Adopting the notation quoted 
at the end of Section 3, we have that the first 
order component magnitudes for a point source 
located in the direction (r, 0,-, <\)^) are 

W= 1/3 



X = cos Qi sin 0; 

Y = sin dj sin 0,- 

Z = cos 0,- 

Thus, if the source distribution function g(d, 0) 
corresponds to a single point source at (r, 0/, 0,), 

g(d, 0) = 1/3 + cos df sin 0; cos sin + 
+ sin dj sin 0;- sin sin + cos 0;- cos + 

For an ensemble of sources with general distribu- 




Fig. 4 - Polar co-ordinates 



Fig. 5 - Sound-field microphones — capsule axes 
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tion function g(0, 0), 

lit TT 

W=V3\ \ g{d,(t>)sm(l)dd d^ 

0=0 <t)=0 

2tt it 
X^ 11 g{B, 4>) cos B sw}<f> dd d^ 



In IT 



Y = 



g(0, 4)) sin B sin^^ dB d4> 



2ir TT 



Z = 



g(B , (p) cos (j) sin 4> dB d(p 



As in the 2-D case, it is often convenient to change 
the gain of W relative to X, Y and Z for optimum 
signal handling and transmission. 

Spherical harmonic analysis forms a power- 
ful technique for studying surround sound in three 
dimension? and two applications are now described. 



5.2 Sound field microphone 
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The "Sound-field" microphone" samples in 
three dimensions the sound field at a point in 
space. If offers unprecendented operational 

versatility and allows one effectively to tilt and 
rotate the microphone and to change its directional 
characteristics remotely by electronic processing of 
output signals. It was developed by P.G. Craven 
and M.A. Gerzon under the auspices of the NRDC*. 
So far as is known, no analytical description of its 
functioning has been publicised and the principles 
of operation are presented here. 

The "microphone" in fact consists of four 
capsules that are directed symmetrically in space, 
that is, towards the corners of a tetrahedron (see 
Fig. 5). It is further necessary that the micro- 
phones are effectively coincident at all audio 
frequencies to prevent phase anomalies and this 
is achieved at all but the highest frequencies. 

The response of a figure-of-eight microphone 

* National Research and Development Corporation 



in the direction (r, B^,(t)^ ) to a source 

6 ) + cos '' 



in 
is 

sin 4>i sin (p cos (0;- 



<Pi cos <Aj 



It may be recalled from Section 3 that the general 
cardioid response is given by 

A + figure-of-eight response 

This gives the general cardioid response in three 
dimensions as 

A + sin (t>i sin (j)^ cos (e,- - ^ ) + cos (t>j cos <j}^ 

or 

A+XcosB, sin^, +Ysm6, sin ^ + Z cos ^ (4) 

11 1 A JL 

Using Equation (4 
the four capsules < 
assuming each has a ( 
are 

and 



Equation (4), we find that the responses of 
)ur capsules of the sound field microphone, 
ingeachhasacardioidcharacteristic(i.e./l = 1) 



= 1 +, 



T73 [sin (j)j (cos df + sin d^) + cos 4>j] 



= 1 +v'T73 [sin 0,- (-cos0,+sin(9;-)-cos ^J 



i?BU = 1 ^sjlll [sin(^; (-cos6|-sin0j-)^cos ® ' 

Applying matrix addition similar to that used for 
M, S, T and Q (see Equation (2)), we have 

W^V2(Lpy+Rpo +Lqq +i?Bu) = 2 
X = 1/2 (Lpu + ^FD - ^BD --^BU ) = 2v/T73cos0;.sin(^. 
F= 1/2 (Lpu "-??FD +^BD --Rbu) = 2v/T7Tsin9;-sin<^ 
Z =V2{L^^-RpQ-L^Q + R^^j) = 2>/m cos <Pi 

These signals form the basis for all subsequent 
processing. From W, X, Y and Z, any cardioid 
characteristic directed along any direction (r,B^,4>^) 
may be synthesised (see Equation (4)) accordmg to 

A + sin (j)^ [X cos d^ +YsmBJ + Z cos (p^ 
where A is proportional to W. 
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Fig. 6 
Cardioid microphone 

in the 
direction (r, 6 , 4> ) 




sin/cos pot 
("tilt") 




output 



Fig. 6 shows a circuit for generating this 
response from the signals W, X, V and Z Sin/cos 
potentiometers (see Fig. 7) are used and are 
described in more detaU in Reference 11. To 
synthesise a pair of orthogonal cardioid micro- 
phones in the horizontal plane, directed along 
(n ±45°, 90°) for example, 



A 


= 


1 


^ 


= 


±45° 


^1 


= 


90° 



and 



The result is independent of the Z component. 

Having generated W, X, Y and Z from the 
sound field microphone, one may effectively tUt 
and rotate the entire array using the following 
matrix 



10 

cos d^ cos ^'^ sin 6^ cos cp'^ sin 0' 







-sin d^ cos 5 



Fig. 7 - Sine/cosine potentiometer 



W' 

X' 

Y' 

Z' 

where d ^ = angle by which the array is rotated 
anticlockwise about the z-axis and 4)' = angle 
by which the array is elevated about the current y' 
(or left/right) axis. 



-cos 6^ sin (j>' ^ -sin 6 sin (j) cos (j) 
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inputs 
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Fig. 8 




/ 






' 












+ 


1 

1 




Y —— 






of W, X, Y, Z signals. 
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, 






cosd)-] 









("tilt") 



Fig. 8 shows means of realising this matrix. 
In addition a variable gain of the W signal would 
allow remote selection of the directional character- 
istics of the "microphones". Further processing of 
the W, X, Kand Z signals may be carried out readily 
to vary the stereo width and other parameters^ ^'''. 

5.3 W. X, y. Z pan-pots 

Based on principles similarto those just discussed, 
two dimensional and three dimensional pan-pots 
that generate W, X, Y and Z signals may be form- 
ulated^ ^'^^. The magnitudes of the first order 
azimuthal components due to a source at (r, d^, 0,-) 
are given by 



and 





W 


= 


constant, B 


X 


= 


cos 6 sin 


Y 


= 


sin d sin 4> 


Z 


= 


cos 4) 


mono input 






r> ^^ 


















4 


cos 9 






/ 










sin0 








/' 


rotot 


n 

e 



w 



X 



To simulate a source at this position, given a mono 
signal, the same signals should be generated. In the 
horizontal plane (^ = 90° and Fig. 9 shows one 
method of realising a horizontal pan-pot. Fig. 10 
shows a realisation of a three dimensional pan-pot. 

When W = 1/3, these pan-pots give notionally 
"exterior panning", i.e. the source is panned along 
the perimeter of the stage (this depends to some 
extent on the decoding method adopted). In order 
to pan inwards, W is increased, although it is more 
practical to reduce the level of the X, Y and Z 
signals. When X, Y and Z equal zero, the centre 
position is simulated. 



6. Routing module 

In a mixing desk with quadraphonic capability, 
each input channel (routing module) has panning 
arrangements that allow the source to be panned 
to any azimuth in the horizontal plane. Con- 
ventionally, these give Lp, Rp, Lg and Rg outputs, 



mono 


nput 






















f 


1 


rs 




sin (p 


/ 


cos a 








/ 






cos d) 


sine 














/,[ 


II 






' roTQTe 




^"tilt- 

















w 

X 



z 



Fig. 9 - Horizontal V^, X, Y pan-pot 



Fig. 10 - Three dimensional W, X, Y, Z pan-pot 
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but if only first order components are required, 
the surround sound signals may be represented by 
W, X and Y (see Sections 2 and 3 ) thereby saving a 
channel. Panning arrangements based on those 
described in Sections 5.3 and Fig. 9 may be used 
for this purpose. 

A mixing desk based on these W, X, Y routing 
modules has W, X and Y group outputs. This may 
be followed by a 2-channel System HJ encoder 
which accepts W, X and Y input signals. Under 
certain circumstances, it may be desirable to encode 
points corresponding to positions given by pair-wise 
panning. As was seen in Section 4, three signals, 
W, X and Y, are not normally sufficient to 
describe such positions. However, given a particular 
2-channel encode matrix, it is possible to simulate 
them in the following way. Since the absolute 
levels and relative phase fully describe the two 
encoded output signals, three variables are sufficient 
to describe them. Thus, the three variables W, X 
and Y (which are real) are also sufficient to describe 
and simulate any position on the Scheiber sphere. 
Analytic equations have been derived (see Appendix 
1 ) to determine these values. Values of W, X and 
Y that simulate the corner positions of pair-wise 
panning for the 1 3 LP2 locus (within the System HJ 
specification) have been determined. These are 
given in Appendix 1 together with the W, X, Y, 
1 3 LP2 encode equations. 

One result of this process is that the absolute 
phase of the encoded output signals is normally 
different to that of the standard 13LP2 locus given 
by pair-wise panning. This implies a.W,X, Y "pair- 
wise" pan-pot with switchable positions because a 
continuous version would be inordinately complex. 



7. W, X, Y, Z signal handling. 

It is often convenient to transfer and record 
signals in W, X, Y, Z form and since the gain of the 
W signal may be chosen arbitrarily (except for 
zero), standardisation is needed. The main 

criterion is signal level. If, for a single source, 
W, X, FandZ take the form B, cos0; sin ^, sin d^ sin <pi 
and cos (^ respectively, then X, Y and Z peak to 
unity and B* may be determined by one of the 
following criteria: 

(1) If the criterion is that, for a single panned 
source, W, X, Y and Z should all peak to the 
same figure, then B should be set to unity. 

(2) If there is an equal probability that a source 

* Note that the constant B need not be equal to the constant A 
mentioned in Sections 3 and 5. 



may be at any position in the horizontal plane 
(neglecting interior panning) and if there are 
no sources off the horizontal plane, then, 
taking a long term or ensemble average. 



B should be — 

V2 

(3) If there is an equal probability that a source 
will be placed at any position in three 
dimensions (neglecting interior panning), then 

1 
B should be — 

%/3 

It is not clear which of these options is the most 
appropriate because of the complex nature of real 
programme signals. At worst, a loss of 7 dB of 
signal-to-noise may result. 



8. Transmission 

Up to this point, the handling and processing 
of signals closely related to the original sound-field 
have been discussed. These original signals may 
under modifications by being mixed with others 
(e.g. from other microphone inputs or reverberation 
plate returns) and the resultant "mix" represents 
the "programme" sound field. Additionally, the 
signals may be encoded by a complex matrix from 
four to two channels whereby phase information is 
introduced. 

It has been common practice, particularly 
before the days of quadraphonic matrix encoding, 
to use the symbols M and S to describe the sum and 
difference signals of the left and right stereo signals, 
thus: 



M 



S 



1/2 (L + R) 
V2 (L - R) 



These are the signals that are multiplexed for stereo 
transmission. With normal stereo they relate 
closely to the M and S channel signals of the "pro- 
gramme" sound field. In the case of matrix 
quadraphony, however, this is only true in the 
simplest sense because of the phase-difference that 
is introduced between the signals. It is convenient 
therefore to define new symbols^ to describe the 
transmission sum and difference signals: 



and 



S = sum 

A = difference 
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Third and fourth terms f and Q are also some- 
times used to describe third and fourth signals to 
be multiplexed^"'. 



9. Conclusions 

This Report has shown how spherical (and 
azimuthal) harmonic analysis can help to delineate 
the essential elements of the recording, trans- 
mission and reproduction of surround sound. 
Theoretically ideal systems have been described 
based on results derived using this form of analysis. 
Means of recording that conform to these ideal 
systems have been developed, namely the "Sound- 
field" microphone, which samples the first order 
characteristics at a point in a sound field, and the 
W, X, Y pan-pots. 

It is argued that the familiar and widespread 

studio technique of pair-wise panning is an 
anomalous means of achieving high order resolution 
in the reproduced sound stage, given a limited 
number of transmission channels and loudspeakers. 
Sharp images are only achieved by taking advantage 
of the deficiencies of the decoding. Nevertheless, 
in quadraphonic production, the technique has been 
shown empirically to be successful and valuable. 

With current developments of surround sound 
transmission, the weakest link in the chain is 
probably that of decoding. With a limited number 
of loudspeakers present, only approximations to 
"ideal" decoding have been achieved and these 
rely on the inadequately understood properties of 
hearing. Considerable scope for research presents 
itself in this field. 
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Appendix 1 

W, X, Y 13LP2 (HJI Encoding and panning 



The 13LP2 matrix encoding equation for 
W, X, Y inputs is 



L. 



R^ 




0.6666/-14.6° 



0.3852 / 73.8° 0.637 0/ -9.1' 



0.666 6/ 14.6° 0.3852 /-73.8° 



0.63 7 0/-1 70.9° 




(Al) 



where 



W = 1 

X = COS0,. 

F = sin0; 

and where 5,- = source azimuth, measured anti- 
clockwise from Cp . 

With this encoding, if we wish to simulate a 
particular microphone and source combination 
that gives a known combination of Lj and Rj 
(as for example for pair-wise panning), then the 
values of W, X and Y may be found in the 
following way. We will make use of the fact that, 
for matrix encoding, it is necessary to maintain 
interchannel phase integrity, but not that of 
absolute phase. The absolute levels of L j and Rj 
must, however, be correctly reproduced. 

Equation (Al) is of the form: 




12 



22 




or 



L, =aj,, W + aj2,X + aj3, Y 



Li=-a„iH/+aj,iX-aj3i Y 



Y 



^1=^111 ^-a,2i^ 
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where L,., L| and R^, R-^ are the real and imaginary 
parts of Lj and Rj respectively and ap^^, apq| are the 
moduli of the real and imaginary parts of apq. 



If we advance the phases of Lj and Rj by an 
angle r? whereby 

L\= L^ cos t] — L-^ sin tj 

L'|= L| cos t? + L^ sin 17 

and similarly for R\ and R' ■^, we find that in 
order to satisfy the given equations, 

tan n = ^i3i(^r--^r) + a^3,(L;+i?;) 
-aj3r (L, + i?,) + aj3i (Li-i?i) 

W, X and Y are then given by 

^(^ur ^i2i '^hir hv^ 

^_ a,^,(L>/?;) + a^^,(L'| -/?',) 
L' - R' 



Y=- 



2a 



lar 



The values of W, X and Y that, when encoded 
according to Equation (Al), simulate the corner 
positions given by pair-wise panning, are shown in 
Table Al. 



Table A 1 - 


13LP2 Pair-wise panning 




W 


X 


Y 


Source Position 


0.7039 


0.7660 


0.6750 


Lp 


Rf 


0.7039 


0.7660 


-0.6750 


Lb 


0.7135 


-0.7765 


0.6889 


Rb 


0.7135 


-0.7765 


-0.6889 
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Appendix 2 
Decoding and reproduction 



A number of important aspects of decoding 
are discussed in Sections 2 and 4. This Appendix 
briefly quotes results that are developments of the 
decoding methods already described. 

As stated in Section 4, one philosophy of 
decoding^ dictates that the signal feeds to loud- 
speakers in a rectangular layout should be 



L'p = 1/2 (kW + 



X 



Y 



y/2 cos i|/ -^/2 sin ^ 



) 



0.866 at the higher frequencies should also be 
included to preserve equal energy in the two 
frequency bands. This is referred to as "psycho- 
acoustic compensation"^ . 



For the more general case of ra(> 4) loud- 
speakers equally distributed on the circumference 
of a circle, with the i'th loudspeaker at azimuth 

360° X -, 



r'p = y2(kw + 



X 



Y 



^/2 cos \p -n/2 sin xjj 



) 



X Y 

L'b = 1/2 (kW - -= ■ + -^=-. — - ) 

y'Z cos xp V 2 sm \jj 



/?'b ^y2(kw- 



X 



y/2 cos 4/ \/2 sin \p 



) 



where the loudspeakers are at azimuths ±\jj and 
1 80° ± \p and ^ is a constant. If W, X and Y are 
such that the ratio of W to W-^^^g^ is ^/2, one 
simple decoding method is given when ^ = 1, 
although k = ^1 also gives good results. A 
more sophisticated method that takes some account 
of the way in which the ear changes its method of 
localisation with frequency, is effected by putting 
^ = 1 at frequencies less than about 400 Hz and 
k - ^/2 at higher frequencies. An overall gain of 



the speaker feed to the z'th loudspeaker, omitting 
psychoacoustic compensation, should be 

P; = W + v^ X cos i//,. +^/2 y sin i//,- 

This decoding approach is discussed in Ref. 7. 

Pair-wise panning and the associated decoding 
has already been discussed in Section 4.1. An 
account of programme-dependent decoding is given 
in Ref. 2. 

The simple and familiar stereo decoding is* 
given by feeding to the left and right loud- 
speakers, signals 



L' =: 1/2 (yv +Y) 
and R' = V2{W-Y) 



1/2 (M + 5) 
V2{M-S) 



It is possible that stereo decoding may be improved 
by employing frequency compensation as for 
surround reproduction. Known practical demon- 
strations of this technique, however, have not been 
successful^ . 
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